2020-02-19

Learning Objectives

  1. Plot vector geospatial data (points/polygons)
  2. Plot raster geospatial data (image)
  3. Extracting data from raster
  4. Making maps (Compass rose, overlays, etc)

Motivation

We are researchers who want to answer the question "What range of environmental conditions do taxon tend to inhabit?"

General workflow:

  • Plot locations of taxon
  • Plot map of environmental conditions
  • Plot contextual information
  • Extract data about environmental conditions at those locations

  • Variants of this question/workflow are applicable to others study systems.

–> –> –> –> –> –> –>

–> –> –> –> –> –> –> –> –> –> –> –> –>

Representation of spatial data

  • Vector data model uses points, lines, and polygons (e.g. locations of animals, ecosystem types)
  • Raster data model divides surface into cells of constant size (e.g. gridded temperature)
  • Non spatial attribute data
  • How many counts at a point?
  • Length of a transect
  • Land-cover type of a polygon

To answer our research question, we will need to:

  1. Read/write spatial data
  2. Represent geographic and attribute data
  3. Transform between different models of Earth
  4. Geographic operations
  5. Make maps

Spatial analysis in R

R packages for spatial analysis

  • Many tools for spatial analysis in R
  • rgdal released in 2003–import from more geographic data formats
  • sp released in 2005–creates spatial objects with classes and generic methods for points, lines, polygons, grids, and attribute data (but still lacked ability to do geometric operations)
  • raster released in 2010–support for raster operations and more

R packages linking to GIS software

  • GRASS, spgrass6, and rgrass7
  • RSAGA, RPyGeo, and RQGIS

Visualization

  • ggplot2
  • ggmap
  • rasterVis
  • tmap
  • leaflet
  • mapview

My choice: sf

  • Developed in 2016 with support from R consortium
  • Based on simple features ISO standards that are not R specific
  • Replaces sp, rdgal for read/write, and rgeos for geometric operations
  • Integrates well with tidyverse

Resources

PART I: PLOTTING POINT AND SHAPEFILE DATA

DATA: 1. taxon point locations, 2. relevant shapefile (Ecosystem type?, country?)

sf package

Simple features

  • feature "abstraction of real world phenomena"
  • simple feature "feature with all geometric attributes described by piecewise straight lines or planar interpolation between sets of points"
  • Represents geometry using points, lines, polygons, or collections of those features
  • ISO standard supported by lots of software

Basic structure

  • sf (simple feature) objects are extended data.frames or tibbles
  • attribute data stored as tibble
  • geometry stored as a list column
  • can have multiple types of geometry in one object
  • class is sfc (simple feature columns) with bounding box bbox and CRS (coordinate reference system) attributes

## Simple feature collection with 100 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -84.32385 ymin: 33.88199 xmax: -75.45698 ymax: 36.58965
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs
## # A tibble: 100 x 3
##     AREA NAME                                                      geometry
##    <dbl> <chr>                                           <MULTIPOLYGON [°]>
##  1 0.114 Ashe      (((-81.47276 36.23436, -81.54084 36.27251, -81.56198 36~
##  2 0.061 Alleghany (((-81.23989 36.36536, -81.24069 36.37942, -81.26284 36~
##  3 0.143 Surry     (((-80.45634 36.24256, -80.47639 36.25473, -80.53688 36~
##  4 0.07  Currituck (((-76.00897 36.3196, -76.01735 36.33773, -76.03288 36.~
##  5 0.153 Northamp~ (((-77.21767 36.24098, -77.23461 36.2146, -77.29861 36.~
##  6 0.097 Hertford  (((-76.74506 36.23392, -76.98069 36.23024, -76.99475 36~
##  7 0.062 Camden    (((-76.00897 36.3196, -75.95718 36.19377, -75.98134 36.~
##  8 0.091 Gates     (((-76.56251 36.34057, -76.60424 36.31498, -76.64822 36~
##  9 0.118 Warren    (((-78.30876 36.26004, -78.28293 36.29188, -78.32125 36~
## 10 0.124 Stokes    (((-80.02567 36.25023, -80.45301 36.25709, -80.43531 36~
## # ... with 90 more rows

sf and tidyverse

  • sf functions begin with st_
  • Methods for summary, plot
  • sf methods for filter, arrange, distinct, group_by, ungroup, mutate
methods(class = "sf")
##  [1] $<-                        [                         
##  [3] [[<-                       aggregate                 
##  [5] anti_join                  arrange                   
##  [7] as.data.frame              cbind                     
##  [9] coerce                     dbDataType                
## [11] dbWriteTable               distinct                  
## [13] filter                     full_join                 
## [15] gather                     group_by                  
## [17] group_map                  group_split               
## [19] identify                   initialize                
## [21] inner_join                 left_join                 
## [23] merge                      mutate                    
## [25] nest                       plot                      
## [27] print                      rbind                     
## [29] rename                     right_join                
## [31] sample_frac                sample_n                  
## [33] select                     semi_join                 
## [35] separate                   separate_rows             
## [37] show                       slice                     
## [39] slotsFromS3                spread                    
## [41] st_agr                     st_agr<-                  
## [43] st_area                    st_as_sf                  
## [45] st_bbox                    st_boundary               
## [47] st_buffer                  st_cast                   
## [49] st_centroid                st_collection_extract     
## [51] st_convex_hull             st_coordinates            
## [53] st_crop                    st_crs                    
## [55] st_crs<-                   st_difference             
## [57] st_force_polygon_cw        st_geometry               
## [59] st_geometry<-              st_interpolate_aw         
## [61] st_intersection            st_intersects             
## [63] st_is                      st_is_polygon_cw          
## [65] st_join                    st_line_merge             
## [67] st_linesubstring           st_make_valid             
## [69] st_minimum_bounding_circle st_nearest_points         
## [71] st_node                    st_normalize              
## [73] st_point_on_surface        st_polygonize             
## [75] st_precision               st_segmentize             
## [77] st_set_precision           st_simplify               
## [79] st_snap                    st_snap_to_grid           
## [81] st_split                   st_subdivide              
## [83] st_sym_difference          st_transform              
## [85] st_transform_proj          st_triangulate            
## [87] st_union                   st_voronoi                
## [89] st_wrap_dateline           st_write                  
## [91] st_zm                      summarise                 
## [93] transmute                  ungroup                   
## [95] unite                      unnest                    
## see '?methods' for accessing help and source code

st_geometry(nc) %>% plot 

# Default methods for objects
plot(nc)

plot(nc[,3])

filter(nc, AREA > 0.2)
## Simple feature collection with 11 features and 2 fields
## geometry type:  MULTIPOLYGON
## dimension:      XY
## bbox:           xmin: -80.06441 ymin: 33.88199 xmax: -76.49254 ymax: 36.06665
## epsg (SRID):    4267
## proj4string:    +proj=longlat +datum=NAD27 +no_defs
## # A tibble: 11 x 3
##     AREA NAME                                                      geometry
##  * <dbl> <chr>                                           <MULTIPOLYGON [°]>
##  1 0.219 Wake     (((-78.92107 35.57886, -78.99881 35.60132, -78.93889 35.~
##  2 0.201 Randolph (((-79.76499 35.50594, -80.06441 35.5057, -80.0426 35.91~
##  3 0.207 Johnston (((-78.53874 35.31512, -78.53947 35.33646, -78.60082 35.~
##  4 0.203 Beaufort (((-77.10377 35.55019, -77.11939 35.5855, -77.14835 35.5~
##  5 0.241 Sampson  (((-78.11377 34.72099, -78.11374 34.69918, -78.15676 34.~
##  6 0.204 Duplin   (((-77.68983 34.7202, -77.92667 34.71101, -77.93931 34.7~
##  7 0.24  Robeson  (((-78.86451 34.4772, -78.91947 34.45364, -78.95074 34.4~
##  8 0.225 Bladen   (((-78.2615 34.39479, -78.32898 34.36442, -78.43794 34.3~
##  9 0.214 Pender   (((-78.02592 34.32877, -78.13024 34.36412, -78.15479 34.~
## 10 0.24  Columbus (((-78.65572 33.94867, -79.0745 34.30457, -79.04095 34.3~
## 11 0.212 Brunswi~ (((-78.65572 33.94867, -78.63472 33.97798, -78.63027 34.~

Reading and writing data

Reading shapefiles

fileloc <- system.file("shape/nc.shp", package = "sf")
nc <- read_sf(fileloc)
  • st_crs() used to convert coordinates (do projections) and do datum transformations
  • Proj.4 string or ESPG code contains this info
  • Web repositories for ESPG and Proj.4 information

st_crs("+proj=longlat +datum=WGS84") # Proj.4 string"
## Coordinate Reference System:
##   EPSG: 4326 
##   proj4string: "+proj=longlat +datum=WGS84 +no_defs"
st_crs(3857)                         # ESPG code: public registry of spatial reference systems, 3857 = web-based mapping (i.e. Google                                         maps, OpenStreetMap)
## Coordinate Reference System:
##   EPSG: 3857 
##   proj4string: "+proj=merc +a=6378137 +b=6378137 +lat_ts=0.0 +lon_0=0.0 +x_0=0.0 +y_0=0 +k=1.0 +units=m +nadgrids=@null +wktext +no_defs"
st_crs(3857)$units                   # check units
## [1] "m"
st_crs(NA)                           # unknown (assumed planar/Cartesian)
## Coordinate Reference System: NA
units::set_units(a1, km^2)
## 1137.389 [km^2]
units::set_units(a2, km^2)
## 1137.598 [km^2]
units::set_units(a3, km^2)
## 1137.598 [km^2]

Other data sources

  • Convert from other Spatial* objects using st_as_sf
  • Convert long/lat data
  • Packages such as rnaturalearth

Convert long/lat data

head(hotsprings_df)
## # A tibble: 6 x 8
##   STATE   LAT  LONG SpringName          Temp_F Temp_C AREA    USGS_quad    
##   <chr> <dbl> <dbl> <chr>                <dbl>  <dbl> <chr>   <chr>        
## 1 CA     38.8 -123. LITTLE        GEYS~    210     99 SANTA ~ (WHISPERING ~
## 2 CA     41.5 -120. HOT        SPRINGS~    208     98 ALTURAS CEDARVILLE 15
## 3 CA     36.0 -118. COSO        HOT SP~    207     97 DEATH ~ HAIWEE RESER~
## 4 CA     41.7 -120. LAKE        CITY H~    207     97 ALTURAS CEDARVILLE 15
## 5 CA     36.0 -118. DEVILS        KITC~    207     97 DEATH ~ HAIWEE RESER~
## 6 CA     40.4 -121. TERMINAL        GE~    205     96 SUSANV~ MT. HARKNESS~

Convert long/lat data

  • Assume that these data are WGS84 (crs = 4326)
  • WGS84: World Geodic System 1984
  • 4326=coordinate system based on Earth's center of mass
hotsprings <- st_as_sf(hotsprings_df,
                       coords = c("LONG", "LAT"),
                       crs = 4326)
head(hotsprings)
## Simple feature collection with 6 features and 6 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -122.748 ymin: 36.036 xmax: -117.769 ymax: 41.67
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs
## # A tibble: 6 x 7
##   STATE SpringName  Temp_F Temp_C AREA   USGS_quad                 geometry
##   <chr> <chr>        <dbl>  <dbl> <chr>  <chr>                  <POINT [°]>
## 1 CA    LITTLE    ~    210     99 SANTA~ (WHISPER~        (-122.748 38.767)
## 2 CA    HOT       ~    208     98 ALTUR~ CEDARVIL~        (-120.078 41.534)
## 3 CA    COSO      ~    207     97 DEATH~ HAIWEE R~        (-117.769 36.045)
## 4 CA    LAKE      ~    207     97 ALTUR~ CEDARVIL~         (-120.206 41.67)
## 5 CA    DEVILS    ~    207     97 DEATH~ HAIWEE R~        (-117.802 36.036)
## 6 CA    TERMINAL  ~    205     96 SUSAN~ MT. HARK~        (-121.375 40.421)

Data from packages

world <- ne_countries(scale = "medium", returnclass = "sf")
st_geometry(world) %>% plot()

Writing data

  • shapefiles
  • database connections
  • use st_write

Exercises

  1. Load the roads shapefile (enproads.shp) and upload Zebra.csv and plot the geometry.
  2. If we look at our roads data, we see that there are several types of roads in the shapefile. How close or far from different road types do zebra move?

Exercise 1

roads <- st_read(here("data", "enproads.shp"), crs = "+init=epsg:4326") %>% #4326= coordinate system based on Earth's center of mass
  st_transform("+init=epsg:32733") #32733 = spatial reference for Nambia 
## Reading layer `enproads' from data source `C:\Users\mbuon\Desktop\QERM597-making-maps\data\enproads.shp' using driver `ESRI Shapefile'
## Simple feature collection with 565 features and 3 fields
## geometry type:  LINESTRING
## dimension:      XY
## bbox:           xmin: 14.35337 ymin: -19.48418 xmax: 17.15124 ymax: -18.5008
## epsg (SRID):    4326
## proj4string:    +proj=longlat +datum=WGS84 +no_defs

st_geometry(roads) %>% plot


zebra_sf <- read.csv(here("data", "Zebra.csv")) %>% 
  dplyr::select(ID = Name, 4:6) %>% 
  mutate(timestamp = as.POSIXct(lubridate::mdy_hm(Date))) %>%
  st_as_sf(., coords = 3:4, crs = "+init=epsg:4326") %>% #longlat, converting foreign object to SF object
  st_transform("+init=epsg:32733") #convert to UTM

ggplot() +
  geom_sf(data=roads) +
  geom_sf(data=zebra_sf, aes(color=ID))

Exercise 2

How close or far from different road types do zebra move?

unique(roads$TYPE)
## [1] Track  Graded Gravel Tar   
## Levels: Graded Gravel Tar Track

#Filter roads based off type:
large_roads <- filter(roads, TYPE %in% c("Tar", "Gravel"))
small_roads <- filter(roads, TYPE %in% c("Graded", "Track"))

ggplot() +
  geom_sf(data=large_roads, size=1.5) + 
  geom_sf(data=small_roads, size=0.6) + 
  geom_sf(data=zebra_sf, aes(color=ID))


# Find the minimum distance (in meters) of each point to a large road
large_dist<- st_distance(y=zebra_sf, x=large_roads) # a units matrix dim =[nrow(x), nrow(y)]; takes 10-20 seconds
zebra_sf$large_road_dist <- apply(large_dist, 2, min)

# Find the minimum distance (in meters) of each point to a small road
small_dist<- st_distance(y=zebra_sf, x=small_roads) # a units matrix dim =[nrow(x), nrow(y)]; takes 10-20 seconds
zebra_sf$small_road_dist <- apply(small_dist, 2, min)

head(data.frame(zebra_sf))
##      ID            Date           timestamp                 geometry
## 1 AG059 4/20/2009 15:40 2009-04-20 15:40:00 POINT (594243.6 7885501)
## 2 AG059 4/20/2009 13:01 2009-04-20 13:01:00   POINT (596576 7879266)
## 3 AG059 4/20/2009 12:02 2009-04-20 12:02:00 POINT (596579.5 7879260)
## 4 AG059 4/20/2009 17:00 2009-04-20 17:00:00   POINT (584733 7890124)
## 5 AG059  4/21/2009 1:40 2009-04-21 01:40:00 POINT (596611.3 7879269)
## 6 AG059  4/21/2009 1:00 2009-04-21 01:00:00 POINT (596605.8 7879266)
##   large_road_dist small_road_dist
## 1        7.845860       3479.8987
## 2      163.187201        241.8660
## 3      156.746071        245.8446
## 4        9.115608       1605.2154
## 5      151.769751        227.9759
## 6      151.358102        231.6984

ggplot(zebra_sf) +
  geom_histogram(aes(large_road_dist, fill="large roads"), alpha=0.5) +
  geom_histogram(aes(small_road_dist, fill="small roads"), alpha=0.5) +
  scale_fill_manual(labels=c("large roads", "small roads"), values=c("blue", "orange"))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

PART II: PLOTTING RASTER DATA AND EXTRACTING INFORMATION

DATA: 1. taxon point locations, 2. relevant shapefile (Ecosystem type?, country?)

PART III: MAKING NICE MAPS

Basics

  • Making maps with sf and ggplot2
  • interfaces with ggplot2 using geom_sf

Adding basemaps!

  • Focus on ggmap
  • As of mid-2018, requires registering with Google and obtaining an API key
  • Requires providing a valid credit card (yikes!), though charges are nonexistent or at least very minimal
  • See the ggmap GitHub page for more information about API keys

Basemap options from ggmap

  • Basemaps available from Google, Stamen, or Open Street Map
  • Terrain, satellite, or watercolor
  • See this helpful cheat sheet

Making maps using ggmap

Two steps:

  • Download basemap raster
  • Plot raster and overlay other spatial data

Register API key at start of session

This is my personal API key, which you can use for the purposes of this class!

register_google(key = "AIzaSyBRkw_wzDKuWZDikdD86Kmp1Sa6PzuCKFc")

Stamen basemaps

To add a Stamen basemap, first define the bounding box for the basemap you would like to download.

zebra_bbox <- c(14, -20, 17.5, -18)
names(zebra_bbox) <- c("left", "bottom", "right", "top")

Stamen basemaps: terrain

terr_basemap <- get_stamenmap(bbox=zebra_bbox, zoom=9, maptype="terrain")

ggmap(terr_basemap) +
  geom_sf(data=large_roads, size=1.5, inherit.aes = FALSE) + 
  geom_sf(data=small_roads, size=0.6, inherit.aes = FALSE) + 
  geom_sf(data=zebra_sf, aes(color=ID), inherit.aes = FALSE) +
  coord_sf(crs = st_crs(4326))

Stamen basemaps: watercolor

wat_basemap <- get_stamenmap(bbox=zebra_bbox, zoom=9, maptype="watercolor")

ggmap(wat_basemap) +
  geom_sf(data=large_roads, size=1.5, inherit.aes = FALSE) + 
  geom_sf(data=small_roads, size=0.6, inherit.aes = FALSE) + 
  geom_sf(data=zebra_sf, aes(color=ID), inherit.aes = FALSE) +
  coord_sf(crs = st_crs(4326))

Google basemaps

To add a Google basemap, first define the center coordinates for the basemap you would like to download.

zebra_center <- c(15.8, -19)
names(zebra_center) <- c("lon", "lat")

Google basemaps: satellite

sat_basemap <- get_googlemap(center=zebra_center, zoom=8, size=c(640, 640), scale=2,
                             maptype="satellite")
ggmap(sat_basemap) +
  geom_sf(data=large_roads, size=1.5, inherit.aes = FALSE) + 
  geom_sf(data=small_roads, size=0.6, inherit.aes = FALSE) + 
  geom_sf(data=zebra_sf, aes(color=ID), inherit.aes = FALSE) +
  coord_sf(crs = st_crs(4326))

Google basemaps: satellite

Notice that get_googlemap() returns a square basemap tile, which then sets the plotting limits of your map.

To manually set different plotting limits, pass in an empty "base layer" and set x and y limits using ggplot().

Google basemaps: satellite

ggmap(sat_basemap, maprange=FALSE, base_layer=ggplot(aes(x=1, y=1), data=NULL)) +
  xlim(14.3, 17.4) + ylim(-20, -18) + xlab("lon") + ylab("lat") +
  geom_sf(data=large_roads, size=1.5, inherit.aes = FALSE) + 
  geom_sf(data=small_roads, size=0.6, inherit.aes = FALSE) + 
  geom_sf(data=zebra_sf, aes(color=ID), inherit.aes = FALSE) +
  coord_sf(crs = st_crs(4326))

Adding basemaps: alternative approaches

  • An alternative package is RgoogleMaps
  • No need to get Google API key (yay!)
  • Does not seems to be compatible with ggplot2 (boo)